Improve search: multi-term AND + relevance ranking (FTS spike)#95
Improve search: multi-term AND + relevance ranking (FTS spike)#95rdhyee wants to merge 2 commits intoisamplesorg:mainfrom
Conversation
Search improvements (immediate): - Multi-term search: "pottery Cyprus" requires BOTH words to match - Relevance ranking: label matches weighted 3x, place 2x, description 1x - Results sorted by relevance score when searching (random for browsing) FTS spike (future path, documented): - Added tools/build_fts_index.py to build DuckDB FTS index offline - Tested: 358 MB full index, 211 MB lite — too large for auto-download - BM25 scoring works correctly (Porter stemming, stopwords) - Next step: explore smaller index strategies or on-demand loading Closes isamplesorg#84 (spike complete — findings documented in PR) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Search input was passed into ILIKE patterns with only single-quote escaping, so a literal "%" or "_" in the query (e.g. "100%", "co_op") silently turned into wildcards. Escape % _ \ and add ESCAPE '\' in both whereClause and the relevance-score expression. Also reframe tools/build_fts_index.py as a spike artifact: the docstring told readers to upload the index to data.isamples.org, but per PR isamplesorg#95 findings the 200-358 MB result is too large to ship. Mark the script NOT in production pipeline and drop the misleading upload instructions. Smoke-tested locally with /tmp/explorer_smoke_test.py (multi-term "pottery cyprus" + wildcard "100%"): 0 JS exceptions, 0 console errors, 0 failed requests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reviewed and pushed two small follow-ups (134aca2): 1. ILIKE wildcard escaping. Search input was passed into the 2. FTS spike script header. Smoke test ( Exercised: initial load, multi-term search ( Other notes from review (not blocking):
LGTM to merge once you've eyeballed the diff. |
Summary
Closes #84 — FTS spike complete with immediate search improvements and documented future path.
Shipped now (zero new dependencies):
FTS spike findings:
tools/build_fts_index.pyATTACHover HTTP in DuckDB-WASM is supported but downloading 200-358 MB is impracticalRecommended next steps (not in this PR):
Test plan
tools/build_fts_index.pyruns successfully with local parquet🤖 Generated with Claude Code